Back

Infection, Genetics and Evolution

Elsevier BV

Preprints posted in the last 7 days, ranked by how well they match Infection, Genetics and Evolution's content profile, based on 43 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Identifying SARS-CoV-2 Lineages that Share the Same Relative Effective Reproduction Numbers

Musonda, R.; Ito, K.; Omori, R.; Ito, K.

2026-04-24 infectious diseases 10.64898/2026.04.22.26351531 medRxiv
Top 0.1%
3.6%
Show abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continuously evolved since its emergence in the human population in 2019. As of 1st August 2025, more than 1,700 Omicron subvariants have been designated by the Pango nomenclature system. The Pango nomenclature system designates a new lineage based on genetic and epidemiological information of SARS-CoV-2 strains. However, there is a possibility that strains that have similar genetic backgrounds and the same phenotype are given different Pango lineage names. In this paper, we propose a new algorithm, called FindPart-w, which can identify groups of viral lineages that share the same relative effective reproduction numbers. We introduced a new lineage replacement model, called the constrained RelRe model, which constrains groups of lineages to have the same relative effective reproduction numbers. The FindPart-w algorithm searches the equality constraints that minimise the Akaike Information Criterion of constrained RelRe models. Using hypothetical observation count data created by simulation, we found that the FindPart-w algorithm can identify groups of lineages having the same relative effective reproduction number in a practical computational time. Applying FindPart-w to actual real-world data of time-stamped lineage counts from the United States, we found that the Pango lineage nomenclature system may have given different lineage names to SARS-CoV-2 strains even if they have the same relative effective reproduction number and similar genetic backgrounds. In conclusion, this study showed that viruses that had the same relative effective reproduction number were identifiable from temporal count data of viral sequences. These findings will contribute to the future development of lineage designation systems that consider both genetic backgrounds and transmissibilities of lineages.

2
Tracking and predicting the dynamics of HIV-1 epidemics in France using virus genomic data

Colliot, L.; Garrot, V.; Petit, P.; Zhukova, A.; Chaix, M.-L.; Mayer, L.; Alizon, S.

2026-04-24 epidemiology 10.64898/2026.04.21.26351380 medRxiv
Top 0.3%
1.9%
Show abstract

Understanding the dynamics of HIV epidemics is important to control them effectively. Classical methods that mainly rely on occurrence data are limited by the fact that an unknown part of the epidemic eludes sampling. Since the early 2000s, phylodynamic methods have enabled the estimation of key epidemiological parameters from virus genetic sequence data. These methods have the advantage of being less sensitive to partial sampling and to provide insights about epidemic history that even predates the first samples. In this study, we analysed 2,205 HIV sequences from the French ANRS PRIMO C06 cohort. We identified and were able to reconstruct the temporal dynamics of two large clades that represent the HIV-1 epidemics in the country. Using Bayesian phylodynamic inference models, we found that the first clade, from subtype B, originated in the end of 1970s, grew rapidly during the 80s before decreasing from 2000 to 2015 and stagnating since then. The second clade, from circulating recombinant form CRF02_AG, emerged and spread in the 80s, grew again in the early 2000s, before declining slightly. We also estimated key epidemiological parameters associated with each clade. Finally, using numerical simulations, we investigated prospective scenarios and assessed the possibility to meet the 2030 UNAIDS targets. This is one of the rare studies to analyse the HIV epidemic in France using molecular epidemiology methods. It highlights the value of routine HIV sequence data for studying past epidemic trends or designing public health policies.

3
Rat hepatitis E virus and novel paramyxoviruses in synanthropic rodents and shrews in Kenya

Ochola, G.; Pulkkinen, E.; Ogola, J. G.; Makela, H.; Masika, M.; Vauhkonen, H.; Smura, T.; Jaaskelainen, A. J.; Anzala, O.; Vapalahti, O.; Mweu, A. W.; Forbes, K. M.; Lindahl, J. F.; Laakkonen, J.; Uusitalo, J.; Altan, E.; Korhonen, E. M.; Sironen, T.

2026-04-21 microbiology 10.64898/2026.04.21.719784 medRxiv
Top 1.0%
0.7%
Show abstract

The majority of emerging infectious diseases are zoonotic, having their origin in wildlife before spilling over into the human population. While small mammals are recognized as critical reservoirs for these viruses, their viral diversity remains largely uncharacterized across many African countries. We conducted molecular surveillance of synanthropic rodents and shrews in the Kibera informal settlement in Nairobi and the rural Taita Hills region of Kenya to detect and characterize potential zoonotic viruses. Tissue samples from 228 rodents and shrews were screened for six viral families using PCR assays. Rat hepatitis E virus (HEV) (Rocahepevirus ratti), a rodent-associated virus with potential for human spillover, was identified in Mus musculus and Rattus norvegicus from Kibera. NGS was conducted for the HEV positive samples, and we obtained two near-complete HEV genomes from Rattus norvegicus, which clustered within rodent-associated HEV genotypes in the phylogenetic analysis. The two sequences from the Rattus norvegicus cluster together, indicating a close genetic relationship. Paramyxoviruses belonging to the genera Jeilongvirus and Parahenipavirus were detected both from Taita and Kibera in nine different samples from Rattus norvegicus, Mus minutoides, Crocidura sp and Acomys ignitus. One paramyxovirus positive sample (Acomys ignitus) from Taita was selected for further sequencing with NGS, and a complete genome of a new jeilongvirus was assembled. Phylogenetic analysis of the detected viruses confirmed the close relation to previously known rodent-borne jeilongviruses but also revealed potentially novel jeilong- and parahenipavirus species. Our findings highlight the circulation of potentially zoonotic viruses in both urban and rural small mammals in Kenya. It emphasizes the necessity of continued genomic surveillance of zoonotic viruses to mitigate risks of their spillover into human populations. HighlightsO_LISurveillance reveals diverse rodent-borne viruses circulating in Kenya. C_LIO_LIRat-HEV was detected in Rattus norvegicus and Mus musculus from an urban low-income area. C_LIO_LIParamyxoviruses were detected across multiple rodent and shrew species, including novel Acomys ignitus jeilongvirus. C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=139 SRC="FIGDIR/small/719784v1_ufig1.gif" ALT="Figure 1"> View larger version (66K): org.highwire.dtl.DTLVardef@194e81eorg.highwire.dtl.DTLVardef@11342cdorg.highwire.dtl.DTLVardef@186ad97org.highwire.dtl.DTLVardef@eeb516_HPS_FORMAT_FIGEXP M_FIG C_FIG

4
Oropouche, Dengue, and Chikungunya differential diagnosis. Development and validation of predictive models with surveillance data from Espirito Santo-Brazil.

Nickel Valerio, E. C.; Coli Seidel, G. M.; Da Silva Nunes, R.; Alvarenga Americano do Brasil, P. E.

2026-04-25 infectious diseases 10.64898/2026.04.17.26350875 medRxiv
Top 1%
0.7%
Show abstract

There is an ongoing Oropouche Fever (OF) outbreak in Brazil since 2024. There are dengue and chikungunya prediction models available, but none to help discriminate dengue, chikungunya, and OF. Objective: This study aims to develop and validate clinical prediction models for dengue, chikungunya, OF. Methods: This study uses surveillance data from Espirito Santo state / Brazil, from 2023-2025. Epidemiological investigations and biological samples were used to conclude cases as either (a) clinical-epidemiologically confirmed, (b) laboratory confirmed, or (c) discarded. The predictors were all data related to signs, symptoms, and comorbidities available in the notification forms. The analysis was performed using random forest regression models, one for each outcome, in development and validation datasets. Results: A total of 465,280 observations were analyzed, 261,691 dengue cases (56.6%), 18,676 chikungunya cases (4.0%), 12,174 OF cases (2.6%), and 179,115 discarded cases (38.6%). All three models had good discrimination and moderate to good calibration after scaling prediction. The models retained from 26 to 16 predictors each. Leukopenia and vomiting were the most discriminatory predictors for dengue, arthritis, arthralgia, and rash were the most discriminatory for chikungunya, and epidemiological features were the most relevant for OF. The dengue, chikungunya, and OF models had ROC AUC of 0.726, 0.851, and 0.896 in the validation set, respectively. Conclusion: This research identified predictors most discriminative between dengue, chikungunya, and OF. We developed and validated predictive models, one for each condition, with moderate to very good performance available at https://pedrobrasil.shinyapps.io/INDWELL/. One may use them in diagnostic work-up and arbovirus surveillance.

5
A phylogenetic approach reveals evolutionary aspects and novel genes of bradyzoite conversion in Toxoplasma gondii

C A, A.; Upadhayay, R.; Patankar, S. A.

2026-04-21 bioinformatics 10.64898/2026.04.20.719551 medRxiv
Top 1%
0.5%
Show abstract

Toxoplasma gondii is a widespread human pathogen that has multiple, clinically relevant stages in its complex life cycle, including fast-replicating tachyzoites and latent bradyzoites. Bradyzoite differentiation is triggered by stress responses that lead to changes in transcription, translation, and metabolism. Two aspects of this process are addressed in this report: first, whether proteins that play roles in bradyzoite differentiation are specific to T. gondii and other bradyzoite-forming parasites of the Sarcocystidae family, and second, whether new bradyzoite differentiation proteins can be identified in T. gondii. To answer these questions, a phylogenetic approach was used, comparing proteomes of select members of the Sarcocystidae family that form morphologically different bradyzoite cysts and members of the Eimeriidae family that do not form cysts. This approach resulted in 8 distinct clusters of T. gondii proteins that reflected different conservation patterns; for example, one cluster showed conservation among all organisms, while another showed conservation in bradyzoite cyst-forming organisms. Known T. gondii proteins involved in bradyzoite differentiation were found in all clusters, indicating that this process uses both highly conserved pathways as well as bradyzoite-specific pathways. Importantly, the cluster containing proteins that are conserved in bradyzoite-forming organisms contained several known regulators of bradyzoites, and will be a source for identifying novel T. gondii proteins that are involved in bradyzoite differentiation.

6
Bioprospecting Novel Luciferase Genes from Museum Coleoptera

Bate, J.; Hardinge, P.; Jathoul, A. P.; Wilson, M. R.; Murray, J. A. H.

2026-04-22 biochemistry 10.64898/2026.04.21.719859 medRxiv
Top 2%
0.4%
Show abstract

Museum collections of Coleoptera contain genetic material of potential interest to biotechnology, and non-destructive DNA extraction enables the preservation of important specimens with concomitant release of mitochondrial and genomic DNA. Mini-barcoding of regions of the mitochondrial cytochrome oxidase subunit I (MT-COI) gene helps identify and eliminate known species from further investigation. Here we identify a novel luciferase gene, using Consensus Degenerate Hybrid Oligonucleotide (CODEHOP) primers targeting the region of the luciferase gene spanning the fourth exon, intron, and fifth exon to detect luciferase gene content and eliminate samples containing known luciferase sequences. Biotinylated luciferase gene probes from the firefly Photinus pyralis enabled the enrichment of potential luciferase gene fragments for next-generation sequencing. A bioinformatic analysis suite was then used to identify a luciferase gene sequence from a previously unidentified firefly originally collected in Costa Rica in 2012. We demonstrate that this newly discovered luciferase, termed CRLuc, catalyses a bioluminescent reaction and we determined its emission spectra, Km for the substrates ATP and D-luciferin, and pH stability.

7
Hematological and Molecular Spectrum of Hemoglobinopathies in the Tharu Population of Nepal

Gupta, U. P.; Pokharel, A.; Jadhav, K.; Jadhav, I.; BC, R. K.; Subedi, S.; Gupta, M.

2026-04-26 public and global health 10.64898/2026.04.23.26351569 medRxiv
Top 2%
0.3%
Show abstract

Hemoglobinopathies are inherited disorders of hemoglobin, most notably sickle cell anemia and thalassemia. These conditions result from mutations in the globin genes, leading either to structural abnormalities in the globin chains or to reduced synthesis of normal globin chains. Hemoglobinopathies is a worldwide health problem according to the World Health Organization; it affects mostly the indigenous Tharu groups in Nepal. Both the global and local rates of illness and death associated with these diseases are on the rise. The objective of this study was to assess the presence of hemoglobinopathies and common mutations of the beta-globin gene within the Tharu population in western Nepal. A cross-sectional study of 1,400 Tharu individuals was conducted among individuals obtained through hospitals within the Banke district, Bardiya district, and Kailali district in western Nepal. A thorough hematological analysis was done with the use of a Sysmex XN-350 analyzer. Hemoglobin variants were detected via high-performance liquid chromatography (HPLC). The molecular characterization of the seven most common mutations of {beta}-thalassemia was performed on a subset of 20 confirmed cases by using a real-time PCR kit.The total number of cases diagnosed with hemoglobinopathies was 14.43% (n=202 out of 1,400). Sickle cell trait (HbAS) was reported as the most prevalent type of Hemoglobinopathies (8.50% of population), followed by {beta}-thalassemia trait (4.00%). In addition to these disorders were sickle cell disease (HbSS), HbE trait, and compound heterozygous states. Hematological parameters differed significantly across types of hemoglobinopathies, and the patterns of microcytic, hypochromic, and hemolytic anemia were also distinct. Commonly documented symptoms included fatigue and joint pain (42.5% and 23.1%, respectively). Molecular characterization of {beta}-thalassemia cases demonstrated that most individuals were compound heterozygotes with IVS1-6 (T>C) as the most prevalent variant. The research identified that the Tharu population in western Nepal has a significant burden of hemoglobinopathies (especially sickle cell trait and {beta}-thalassemia), highlighting the requirement for appropriate screening programs, genetic counseling and public health strategies to help manage and prevent these conditions within this particular region.

8
Enteroaggregative Escherichia clade I from Nigeria

Dada, R. A.; Akinlabi, O. C.; Tytler, B. A.; Olayinka, B. O.; Page, A. J.; Thomson, N.; Okeke, I. N.

2026-04-22 microbiology 10.64898/2026.04.21.719883 medRxiv
Top 2%
0.3%
Show abstract

Escherichia coli, the Escherichia type species, is present in mammalian and avian intestinal microbiota, and includes both commensals and pathogens. Other Escherichia species are understudied because they are less commonly associated with human disease and because of paucity of tools that can correctly delineate them from E. coli. However, other species of this genus including Escherichia albertii and Escherichia fergusonii are repeatedly reported as diarrhoeagenic. We hypothesized that some bacteria fitting the definition of enteroaggregative E. coli (EAEC) belong to species other than E. coli. We used phylogeny to determine the species of 2,818 Escherichia genomes from diarrhoea epidemiology studies in Nigeria. Phylogeny speciation was confirmed using GTDB-tk and ClermonTyping. Virulence genes were detected using ARIBA/Virulencefinder database and multilocus sequence typing performed using the Achtman scheme. Fourteen non-coli Escherichia genomes were identified-- Escherichia clade I ST485 (11), Escherichia ruysiae ST5792 (2) and Escherichia fergusonii ST5636 (1). All the Escherichia clade I ST485 carry EAEC virulence genes aap, aar, astA and air, as well as hlyF, eatA, tsh, traT, and chuA virulence genes. Interestingly, 62% of enteroaggregative Escherichia clade I ST485 genomes listed on Enterobase are from Africa isolates, despite only 3% of genomes overall coming from the continent. Our results suggest that non-coli Escherichia species are infrequently isolated from human stool, but, when they are, they are misidentified as E. coli so that their significance is largely overlooked. Escherichia clade I ST485 is a globally disseminated enteroaggregative Escherichia clade I lineage that is common in Africa. Author SummaryEscherichia clade I are rarely associated with disease and because of the difficulty in differentiating them from Escherichia coli in routine laboratory, they are often misidentified as Escherichia coli leading to the underestimation of their impact on the burden of disease. Additionally, some clones of Escherichia clade I also carry genetic markers that have been used to define Enteroaggregative Escherichia coli (EAEC), a cause of persistent diarrhoea in developing countries and travellers diarrhoea in developed economies. EAEC has also been associated with malnutrition and poor growth among children in developing economies. We here describe clones of Escherichia clade I (ST485) that carries enteroaggregative genes and in some cases, recovered from diarrhoeal cases. We show from genomes deposited on Enterobase and our study, that this clone is globally disseminated, often associated with human infections and often misidentified as Escherichia coli. We also describe other non-coli Escherichia other than Escherichia clade I isolated from humans. We suggest that the Escherichia clade I clone carrying enteroaggregative genes may be described as Enteroaggregative Escherichia clade I.

9
Evolutionary history of alpha satellite DNA in Cercopithecini: comparative cytogenomics highlights the diversification pattern of primate centromere repeats

Cacheux, L.; Dutrillaux, B.; Gerbault-Seureau, M.; Nicolas, V.; Ponger, L.; Bed'Hom, B.; Escude, C.

2026-04-21 evolutionary biology 10.64898/2026.04.19.719437 medRxiv
Top 2%
0.3%
Show abstract

BackgroundAlpha satellites, a superfamily of AT-rich tandem repeats, are the primary DNA component of centromeres in Platyrrhini and Catarrhini. Analyses of the human genome suggest that centromeres behave like biological ridges, with new alpha satellite families expanding at the centromere core, splitting and displacing older ones towards the pericentromeres. The Cercopithecini tribe, which displays an unusual chromosomal evolution involving multiple chromosomal fissions and centromere formations, represents a promising model to enhance our understanding of alpha satellite DNA evolutionary history. We previously applied targeted sequencing to centromere DNA from two distant species drawn from the Cercopithecini terrestrial and arboreal lineages, and characterized six alpha satellite families exhibiting varying mean sequence identities. MethodsCombining classical and molecular cytogenetics, we mapped the chromosomal distribution of these alpha satellite families across 13 Cercopithecini, one Papionini, and one Colobinae species. A nuclear marker-based phylogeny provided an evolutionary framework for interpretation. ResultsOur phylogeny identifies the terrestrial and arboreal lineages, and a newly designated swamp clade. We observed significant interspecies variations in alpha satellite patterns, including differences in presence/absence and distinct chromosomal distribution patterns (centromeric, pericentromeric, or subtelomeric). Families previously described as heterogeneous (83-87% mean sequence identity) exhibit a centromeric position in the swamp lineage, which is characterized by conserved karyotypes. In contrast, these families show a pericentromeric distribution in the terrestrial and arboreal lineages, replaced at the centromere core by more homogeneous families (95-98% mean sequence identity). In the arboreal clade, which is characterized by highly fissioned karyotypes, putative evolutionary new centromeres show a unique co-occurrence of highly homogeneous and heterogeneous families. Conclusion & ImplicationsWe propose a comprehensive evolutionary scenario for alpha satellite DNA in Cercopithecini, where younger families arise at the centromere core, shift toward the pericentromeres as they age, and eventually face extinction. Our study suggests that alpha satellite DNA and chromosomes evolve in an interdependent manner, with satellite diversification and displacement occurring in parallel with chromosome fissions and centromere repositioning. This comparative cytogenomic approach provides both support for the human-based evolutionary model for alpha satellite DNA and novel temporal insights into its diversification dynamics. Beyond evolutionary genomics, our findings highlight the potential of alpha satellite DNA to complement systematic studies in deciphering complex primate evolutionary histories.

10
Revisiting the Monascus genus (Eurotiales, Aspergillaceae): A Multilocus Phylogenetic Approach to Species Delimitation

Chen, W.; Chen, S.; Jia, L.; Zhou, Y.; Shao, Y.; Chen, F.

2026-04-21 microbiology 10.64898/2026.04.21.719803 medRxiv
Top 3%
0.2%
Show abstract

Monascus spp. are economically important filamentous fungi that have been utilized in the production of beneficial metabolites such as Monascus pigments and monacolin K, as well as in the brewing of some Asian fermented foods. The delimitation of Monascus species has traditionally relied on phenotypic traits; however, this morphological classification approach is susceptible to subjective judgments and variations in cultural conditions and also may not necessarily be related to the actual genetic relationship. Consequently, synonymy and misidentification frequently occur in Monascus taxonomy, highlighting the urgent need for a convenient and reliable classification system for this genus. In this study, a phylogenetic analysis of 82 representative Monascus strains, encompassing all previously recognized species of the genus, was conducted based on the concordance of five gene genealogies (BenA, CaM, ITS, LSU, and RPB2) to clarify species delimitation and resolve phylogenetic relationships within Monascus. The results revealed that the genus Monascus is resolved into 11 species, which are clustered into two sections: Floridani (including M. argentinensis, M. flavipigmentosus, M. floridanus, M. lunisporas, M. mellicola, M. pallens, and M. recifensis) and Rubri (including M. pilosus, M. purpureus, M. ruber, and M. sanguineus). M. pilosus and M. sanguineus were reaffirmed as distinct species due to their well-supported and divergent phylogenetic lineages. Additionally, M. albidulus, M. anka, M. barkeri, and M. fumeus are synonymized with M. pilosus, while M. aurantiacus and M. rutilus are synonyms of M. purpureus. Finally, a comprehensive list of accepted Monascus species along with their corresponding barcode sequence data is provided.

11
Epidemiology and Predictors of Fluoroquinolone Resistance in ESBL-Producing Escherichia coli: Implications for Empirical Therapy in Mexico

Gallardo Mejia, A.; Almeida, J.

2026-04-22 infectious diseases 10.64898/2026.04.21.26351439 medRxiv
Top 3%
0.2%
Show abstract

Urinary tract infections (UTIs) are among the most common infectious diseases worldwide, with Escherichia coli being the predominant uropathogen. The increasing prevalence of extended-spectrum beta-lactamase (ESBL)-producing strains and their association with fluoroquinolone resistance pose a significant challenge to empirical therapy, particularly in community settings. The aim of this study was to determine the epidemiology and predictive factors associated with ESBL-producing E. coli and its concomitant fluoroquinolone resistance in community-acquired clinical isolates. A retrospective cross-sectional study was conducted analyzing 244 clinical E. coli isolates. Demographic and microbiological data were collected, including age, sex, sample type, and antibiotic susceptibility. Associations between variables and ESBL production were assessed using Pearsons chi-squared test, and odds ratios (ORs) with 95% confidence intervals (CIs) were calculated. Of the isolates, 165 (68%) were ESBL-producing. A significant association was observed between age group and ESBL production (p < 0.001), with the highest frequency in the 20-39 age group. Most ESBL-positive isolates were obtained from women (73%), although odds ratio (OR) analysis suggested a non-significant trend toward a higher probability in men (OR = 1.29; 95% CI: 0.72-2.31). High rates of fluoroquinolone resistance were identified among the ESBL-producing isolates, with 30% resistance to levofloxacin and 35% to ciprofloxacin (p < 0.001). Urine samples showed the highest concentration of ESBL-positive isolates, with a significant association between sample type and resistance (p < 0.001). The high prevalence of ESBL-producing E. coli and its concomitant resistance to fluoroquinolones highlight a critical challenge for the empirical treatment of urinary tract infections in Mexico, underscoring the need to strengthen antimicrobial use management and local surveillance strategies.

12
Computational Drug Repurposing Targeting LuxS-Mediated Quorum Sensing in Fusobacterium nucleatum: A Virtual Screening and Molecular Dynamics Approach

Cedeno, K.; De Leon, D.; Chiari, M.

2026-04-21 microbiology 10.64898/2026.04.20.719701 medRxiv
Top 4%
0.1%
Show abstract

Fusobacterium nucleatum is an anaerobic bacterium strongly associated with the development and progression of colorectal cancer (CRC). Its pathogenic mechanisms involve the LuxS/AI-2 quorum sensing (QS) system, which regulates biofilm formation, virulence factor expression, and host immune evasion. Targeting LuxS represents a promising anti-virulence strategy that could disrupt bacterial communication without inducing selective pressure for antibiotic resistance. In this study, we employed a computational drug repurposing pipeline to identify FDA-approved drugs capable of inhibiting the LuxS enzyme in F. nucleatum. We performed structure-based virtual screening of 9,466 compounds from DrugBank using AutoDock Vina against the AlphaFold-predicted LuxS structure (UniProt: A0A133NIU3). From 1,082 initial hits (binding energy [&le;] - 7.0 kcal/mol), we applied ADMET filtering and composite scoring to select the top 5 candidates. Molecular dynamics simulations (10 ns each) using OpenMM with the AMBER14 force field confirmed the stability of all five protein-ligand complexes (RMSD < 2.0 [A]). The most promising candidates include Tubocurarine ({Delta}G = -16.97 kcal/mol, RMSD = 1.87 [A]), Docetaxel ({Delta}G = -13.22 kcal/mol, RMSD = 1.81 [A]), Metyrosine ({Delta}G = -13.78 kcal/mol, RMSD = 1.97 [A]), and Ergometrine ({Delta}G = -13.22 kcal/mol, RMSD = 1.92 [A]). These results constitute an exploratory computational basis that requires subsequent experimental validation through in vitro and in vivo assays, and provide candidates for testing as anti-quorum sensing agents against F. nucleatum, with potential implications for CRC prevention and treatment.

13
Comparative analysis of transposable elements in jellyfish and hydroid species (Cnidaria: Medusozoa)

Mays, A.; Cabrera, F.; Macias-Munoz, A.

2026-04-21 evolutionary biology 10.64898/2026.04.17.719288 medRxiv
Top 4%
0.1%
Show abstract

BackgroundTransposable elements (TEs) are repetitive genetic elements that can jump to new loci causing genome expansions, structural rearrangements, and can, ultimately, propel the evolution of genomes. Despite their significance, the role of TEs in the evolution of genomes and phylogenetic groups remains largely understudied in early diverging lineages. Further, the extent to which TE content varies across species is still an open question. Medusozoa, a group within Cnidaria encompassing jellyfish and hydroids, exhibits an exceptional diversity of life history strategies, body plans, and physiological capabilities. These characteristics, along with its early-diverging phylogenetic position, establish Medusozoa as an ideal system for investigating the composition and evolutionary history of TEs within the group. ResultsWe generated a custom repeat library built from annotations of 25 Medusozoan genomes and used it to characterize TEs, aiming to identify lineage-specific TE content and activity that may correlate with the diversity observed within the group. We found that repetitive element percentage and genome size varied considerably, with Hydrozoa exhibiting the most variation among classes in both respects. DNA transposons were the most prevalent TE classification in all but two genomes, averaging 28% of all genomes. Intra-genus comparisons revealed a surprising degree of differences in TE content. In the genus Aurelia, the expansion of a single DNA transposon superfamily accounted for much of the difference in repetitive element percentage between two species, whereas in the genus Turritopsis, a similar divergence resulted from the proliferation of multiple superfamilies. Interestingly, most genomes showed evidence of recent TE expansions, suggesting ongoing activity in many medusozoan species. ConclusionWe present the first comparative analysis of TEs across all medusozoan classes. Our results reveal class-specific TE dynamics and highlight cases of TE proliferations as lineages diverge. This research provides data on TE activity and diversity that can be used as a resource for future study and fills important gaps in our understanding of TEs in early diverging animal lineages.

14
Unlocking a flexible set of phylogenetic models for discrete and continuous trait evolution using discretized stochastic diffusion

Revell, L. J.; Alencar, L. R. V.; Alfaro, M. E.; Dain, J.; Hill, N. J.; Jones, M.; Martinet, K. M.; Romero-Alarcon, V.; Harmon, L. J.

2026-04-21 evolutionary biology 10.64898/2026.04.20.719455 medRxiv
Top 5%
0.1%
Show abstract

The practical utility of many modern phylogenetic comparative methods can depend on how accurately mathematical models capture the evolutionary process of traits. Boucher and Demery (2016) described a new quantitative trait model, Brownian motion with reflective limits, that they anticipated might be of use in testing hypotheses about a particular sort of constraint on phenotypic character evolution. Since their analytic solution for the probability function under this bounded evolutionary scenario was not practical to evaluate for reasonably-sized trees, Boucher and Demery (2016) also identified a creative technique for computing the likelihood of their model. The basis of this methodology derives from the convergence of an equal-rates, symmetric, ordered Markov chain and continuous stochastic diffusion in the limit as the number of steps in our chain goes to {infty} (or, alternatively, as their widths decrease towards zero). We refer to this convergence in the limit as the discretized diffusion approximation or (more compactly) the discrete approximation. We realized that this discrete approximation of Boucher and Demery (2016) unlocked a number of additional models for the phylogenetic comparative analysis of discrete and continuous trait data, and we explore several of these in the present article. Specifically, we examine application of this discretized diffusion approximation to the threshold model from evolutionary quantitative genetics, to a new "semi-threshold" trait evolution model, to a joint model of discrete and continuous traits in which the discrete trait influences the rate of evolution of our continuous character, as well as a model where precisely the converse is true, and to a discrete character dependent multi-trend trended continuous trait evolution model. We conclude with some context for the origins of our article and discussion of other possible applications of this powerful approach.

15
A Ribosomal Marker-Based Metataxonomic Framework for Environmental Surveillance of Nematodes of Public Health Importance

Zuluaga, J. P.; Bedoya-Urrego, K.; Alzate, J. F.

2026-04-22 microbiology 10.64898/2026.04.21.720024 medRxiv
Top 5%
0.1%
Show abstract

Metataxonomic analysis targeting the V4 region of the 18S rDNA gene, combined with molecular phylogenetic inference, was applied to detect nematode DNA of public health relevance in environmental matrices. A total of 25 mOTUs corresponding to six nematode taxa were detected in environmental samples from the Andean region of Colombia. Analysis of 12 water and sludge samples from wastewater treatment plants, 5 artisanal agricultural bioinputs, and 3 food samples revealed multiple species of public health significance: Trichuris trichiura, Enterobius vermicularis, Ascaris spp., and Necator americanus. We also confirmed zoonotic species, including Angiostrongylus cantonensis and Trichinella spp. These findings demonstrate that combining metataxonomics with molecular phylogeny provides a scalable molecular framework for the environmental surveillance of parasitic nematodes, overcoming the limitations of traditional morphological identification methods. This approach offers a replicable model for strengthening control and monitoring programs for parasitism in human populations.

16
Tuberculosis in households with infectious cases in Kampala city: Harnessing health data science for new insights on an ancient disease with persistent, unresolved problems (DS-IAFRICA TB) study protocol

Nassinghe, E.; Musinguzi, D.; Takuwa, M.; Kamulegeya, R.; Nabatanzi, R.; Namiiro, S.; Mwikirize, C.; Katumba, A.; Kivunike, F. N.; Ssengooba, W.; Nakatumba-Nabende, J.; Kateete, D. P.

2026-04-25 infectious diseases 10.64898/2026.04.23.26351571 medRxiv
Top 5%
0.1%
Show abstract

Tuberculosis (TB) is prevalent in Uganda and overlaps with a high rate of HIV/TB coinfection. While nearly all hospital-based TB cases in Kampala, the capital of Uganda, show clear TB symptoms, 30% or more of undiagnosed TB cases found through active screening are asymptomatic. Additionally, the host risk factors for TB in Kampala cannot be distinguished from environmental risk factors. These TB-specific challenges are just part of the complexity, especially in areas with high HIV/AIDS burden. Data science techniques, especially Artificial Intelligence (AI) and Machine Learning (ML) algorithms, could help untangle this complexity by identifying factors related to the host, pathogen, and environment, which are difficult to explain or predict with traditional/conventional methods. In this project, we will use health data science approaches (AI/ML) to identify factors driving TB transmission within households and reasons for anti-TB treatment failure. We will utilize the computational resources at Makerere University and available demographic, clinical, and laboratory data from TB patients and their contacts to develop AI and ML algorithms. These will aim to: (1) identify patients at baseline (month 0) unlikely to convert their sputum or culture results by months 2 and 5, thus at risk of failing TB treatment; (2) identify household contacts of TB cases who are at risk of developing TB disease, as well as contacts who may resist TB infection despite repeated exposure to M. tuberculosis. Achieving these objectives will provide evidence that data science methods are effective for early detection of potential TB cases and high-risk patients, thereby helping to reduce TB transmission in the community. The study protocol received approval from the School of Biomedical Sciences IRB, protocol number SBS-2023-495.

17
Molecular epidemiology of rifampicin resistant Mycobacterium tuberculosis in Vietnam

Solomon, O. E.; Nguyen, V. N.; Nguyen, H. B.; Nguyen, T. A.; MacLean, E. L.-H.; Fox, G. J.; Behr, M. A.

2026-04-27 infectious diseases 10.64898/2026.04.20.26351312 medRxiv
Top 5%
0.1%
Show abstract

Background: Vietnam is a top 20 burden country for multi-drug resistant/rifampicin-resistant tuberculosis (MDR/RR-TB), with nearly 10,000 cases a year. With the emergence of new diagnostic assays for M. tuberculosis and resistance, along with new drugs for both treatment and prevention, we sought to better understand the molecular epidemiology of RR-TB in this high-burden setting, through the study of clinical trial isolates from the VQUIN MDR trial. Methods: We assembled a sample of cultured isolates, collected from patients with confirmed RR-M. tuberculosis within 10 provinces, enriching for isolates from outside of the 2 major cities, Hanoi and Ho Chi Minh City. We subjected these isolates whole genome sequencing (WGS) and bioinformatic analysis, with a subset subject to phenotypic drug susceptibility testing to evaluate phenotypic/genotypic concordance. New genome sequences were phylogenetically contextualised to publicly-available M. tuberculosis genome sequences sampled in Vietnam from National Center for Biotechnology Information (NCBI) Sequence Read Archives (SRA). Results: Isolates from 252 RR-TB cases passed quality controls and were available for analysis. Xpert MTB/RIF had a high concordance with WGS-based rifampicin-resistance prediction (PPV=96.8%). Of the 244 isolates confirmed to be rifampicin resistant, a high proportion (235/244 = 96.3%) had mutations associated with resistance to at least one other first- or second-line antibiotic. Phenotypic drug susceptibility testing (DST) for rifampicin, isoniazid, and levofloxacin was completed for 77 isolates with a high concordance demonstrated between DST and genomic-based resistance predictions (67/77, 87.0% RIF; 76/77, 98.7% INH; 73/77, 94.8%LFX). High concordance was also observed with new and repurposed antibiotics linezolid (100%, 60/60), pretomanid (100%, 60/60), and bedaquiline (56/60, 93.3%). Rifampicin-resistant strains were more likely to be lineage 2.2.1, compared to rifampicin-susceptible M. tuberculosis strains in Vietnam, particularly in the major cities. Conclusions: The high prevalence of secondary drug-resistance beyond RIF and INH, along with the dominance of one major lineage across geographic regions, provides insights on the spread of MDR/RR-TB in Vietnam and reinforces the importance of prompt and broad detection of drug-resistance to inform the timely initiation of effective drug regimens.

18
Assessing medication-related burden and medication adherence among older patients from Central Nepal: A machine learning approach

Giri, R.; Agrawal, R.; Lamichhane, S. R.; Barma, S.; Mahatara, R.

2026-04-23 geriatric medicine 10.64898/2026.04.22.26351447 medRxiv
Top 5%
0.0%
Show abstract

We are pleased to submit our Original article entitled "Assessing medication-related burden and medication adherence among older patients from Central Nepal: A machine learning approach" for consideration in your esteemed journal. In this paper, we assessed medication burden using validated Living with medicines Questionnaire (LMQ-3) and medication adherence using Adherence to Medication refills (ARMS) Scale. In this paper we analysed our result through machine learning approach in spite of traditional statistical approach to identify the complex factors influencing both. Six ML architectures (Ordinary Least Square, LightGBM, Random Forest, XGBoost, SVM, and Penalized linear regression) were employed to predict ARMS and LMQ scores using various socio-demographic, clinical and medication-related predictive features. Model explainability was provided through SHAP (Shapley Additive exPlanations). Our study identified the moderate medication burden with moderate non-adherence among older adults. Requiring assistance for medication and polypharmacy were the strongest drivers for the medication burden and non-adherence. The high predictive accuracy by ML suggests the appropriate clinical intervention like deprescribing to cope with the high prevalent medication burden and non-adherence among older adults in Nepal.

19
Ethanolic Extract of Polish Propolis exhibits synergy with selected antifungal agents against yeast pathogens causing candidiasis

Bollin, P.; Pieranski, M. K.; Kus, P. M.; Van Dijck, P.; Szweda, P.

2026-04-22 microbiology 10.64898/2026.04.21.719917 medRxiv
Top 6%
0.0%
Show abstract

Candidiasis pose a serious health threat, stimulating efforts to develop new antifungal agents and alternative therapies. Given the high mortality of fungal infections and the historical use of natural remedies, there is a growing interest in integrating natural substances into modern treatments. It is particularly important to explore interactions between home remedies and clinically approved antifungals to avoid harmful combinations or enhance beneficial effects. In this study, the chemical composition of the ethanolic extract of propolis (EEP) using UHPLC-DAD-QqTOF-MS was analyzed. The interactions of this extract with several antifungal agents against four yeast pathogens causing candidiasis: Candida albicans, Nakaseomyces glabratus, Pichia kudriavzevii, and Candida auris were investigated using Checkerboard Titration Assay, Growth Kinetics, and Disc-diffusion assay. Also, a novel simulated infection model was proposed. The results showed synergistic interactions between EEP and amphotericin B, and additive effects with nystatin. Synergy and additivity with fluconazole and voriconazole were observed, but limited to C. albicans and N. glabratus. In contrast, antagonistic interactions were noted with caspofungin, clotrimazole, and ketoconazole, which may have clinical relevance. Additionally, positive interactions with 2-phenoxyethanol and silver nanoparticles (AgNPs) suggest potential practical applications. Propoliss synergistic properties could expand antifungal strategies and support the development of multi-target, resistance-preventing therapies.

20
Reveal Principles of Codon Optimization via Machine Learning

Deng, F.; Li, H.; Sun, D.; Duan, G.; Sun, Z.; Xue, G.

2026-04-21 bioinformatics 10.64898/2026.04.16.718958 medRxiv
Top 6%
0.0%
Show abstract

High level of protein expression is usually welcomed in industry and research, and codon optimization is widely used to achieve high expression. Methods of implementing codon optimization can be divided into two branches, one is classical methods which develop cost functions based on empirical law, another is AI methods which learn the codon choice principles from endogenous genes with neural networks. Here we develop two codon optimization tools based on two branches respectively, namely OptimWiz 2.1 and OptimWiz 3.0. Results of fusion protein fluorescence detection indicate that both OptimWiz 2.1 and OptimWiz 3.0 are superior to all the other commercially available codon optimization tools. Principles of codon optimization are revealed in the process of machine learning on both tools.